Single late-write for standard synchronous SRAMs

ABSTRACT

Synchronous SRAM may conform to Std. Sync or early-write at an external interface whilst providing late-write internally.

TECHNICAL FIELD

Embodiments of the present invention relate to the field of semiconductor memory devices and, in particular, to write-cycle techniques in memory devices.

BACKGROUND

SRAM (static random access memory) memory devices are well known in the computer arts and usage is commonplace. SRAM may be Synchronous, i.e. clocked by edges of a clock, typically a master clock. In particular, in Synchronous SRAM, Addresses may be strobed at a rising clock edge and subject to customary set-up and hold constraints.

From a memory device (chip) user's perspective, various architectures are available and there are various trade-offs to take into account when selecting the memory architecture to use. For example, flow-through SRAM devices may present read-data one clock cycle after the memory address is presented. Pipelined SRAM devices may typically present read-data two clock cycles after the memory address is presented.

Std. Sync (standard synchronous) SRAM devices may typically expect simultaneous presentation of addresses and corresponding write-data at the same clock edge. “Single Late-Write” SRAM devices may typically expect write-data to be presented one clock cycle after the corresponding address is presented. NoBL™ (“No Bus Latency™”) SRAM devices may typically expect write-data to be presented two clock cycles after the corresponding address is presented. Burst modes are also used so that a single address setup and hold may initiate the transfer of multiple consecutively addressed words of data.

These various architectures have associated trade-offs. Trade-offs may include price, clock speed, read-write turnaround times, latency, circuit complexity and more. NoBL™ SRAMs may have advantages where latency is an important consideration by reducing or eliminating deselect cycles. Std. Sync SRAM devices may be particularly attractive in caching applications, such as where price and read-speed are the dominant factors.

Also there are legacy considerations, both supplier and user become invested, over time, in various ways and in particular interfaces, thus, there can be strong incentives to provide improved but compatible devices.

FIG. 1 shows a block diagram of part of the write-data circuit of a conventional Std. Sync. SRAM. Write-data may be presented at an external I/O pad 101, and clocked into a Master/Slave register 102. The data may then travel down on-chip data-in (di) lines 103 which, for example, may typically be about 6000 microns long. A write-driver circuit 104 may then transfer the data to the memory core 120 responsive to a write pulse signal (wps) from a delay circuit 105. The paths corresponding to time delays tD1, tD2 are discussed below, in connection with FIG. 3. As is conventional in similar block diagrams, only a single data or address line may be shown but the presence of multiple lines and signals in parallel may be inferred or assumed wherever relevant or apparent as is well-known in the art.

FIG. 2 shows a block diagram of part of a conventional Std. Sync. SRAM circuit. As contrasted with FIG. 1, it shows additional parts of what may be the same type of device, including part of a read-data circuit and an address circuit. In addition to the features of FIG. 1 there is an Address-Pad 201 to receive addresses (encoded as voltages) from off-chip and wherefrom they may be clocked into to Address Register 208 and thence to address decoder 209 for address selection in memory core 120. Also shown are data-out lines (do) 203 which convey read-data from the memory core 120 through write driver 104 to a pipeline register 202 and thence to I/O pad 101. The data-in lines 103 (di) switch (i.e. experience data-dependent transitions or non-transitions) even during read cycles. The paths corresponding to time delays tD1, tD2 are discussed below, in connection with FIG. 3.

FIG. 3 shows a timing diagram for a conventional SRAM, such as that of the device partially represented by FIGS. 1 and 2. It can be seen that Address and corresponding Write-Data (e.g. A0/D0) are presented effectively simultaneously. Time tD1 is equivalent to the time for data to travel from I/O pad 101 to registers 102, 208 (FIG. 1 or FIG. 2). More precisely, time tD1 is equivalent to the time to become stabilized rather than marginal or meta-stable, at the input to the registers 102, 208 (FIG. 1 or FIG. 2) responsive to a voltage transition. Time tD2 may be equivalent to the time for data to travel down the on chip data-in lines (di) 103 (FIG. 1 or FIG. 2). Time tW represents the time for the Write operation into the memory core and tR the time for Write recovery of the memory core. Other signals in FIG. 3 such as ck (data and address clock) and wps (write-pulse) are conventional in the art.

Still referring to FIG. 3 the master clock xclk is shown operating to generate timing signals W at the highest allowed frequency which is the inverse of the sum of the serially sequenced activity times, i.e. tCYCW(min)=tD1+tD2+tW+tR and f(max)=1/tCYCW(min) (seconds and Hz respectively). Thus, this configuration of SRAM provides early-write at the external interface and actually performs an early-write (as contrasted with, for example, NoBL™ devices (which may be characterized as double late-write)). SRAMs of this kind may be said to have an external interface according to static random access memory standard synchronous timings for early-write (or Std. Sync SRAM for short).

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention is illustrated by way of example and not limitation in the figures of the accompanying drawings in which like references indicate similar elements and in which:

FIG. 1 shows a block diagram of part of the write-data circuit of a conventional Std. Sync. SRAM.

FIG. 2 shows a block diagram of part of a conventional Std. Sync. SRAM circuit.

FIG. 3 shows a timing diagram for a conventional SRAM, such as that of FIG. 1.

FIG. 4 shows a block diagram of part of a write data circuit of a Std. Sync. SRAM circuit according to an embodiment of the invention.

FIG. 5 shows a block diagram of another part of a Std. Sync. SRAM circuit according to an embodiment of the invention.

FIG. 6 shows a timing diagram for a Std. Sync. SRAM circuit according to an embodiment of the invention.

FIG. 7 shows a timing diagram that contrasts timings for a conventional Std. Sync. SRAM circuit with a Std. Sync. SRAM circuit according to an embodiment of the invention.

DETAILED DESCRIPTION

In the following description, numerous details are set forth to provide a more thorough explanation of embodiments of the present invention. It will be apparent, however, to one skilled in the art, that embodiments of the present invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring embodiments of the present invention.

Reference in the specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” in various places in the specification do not necessarily all refer to the same embodiment.

FIG. 4 shows a block diagram of part of a write-data circuit of a Std. Sync. SRAM circuit according to an embodiment of the invention. As in FIG. 1, an external I/O pad 101 is coupled to a Master/Slave register 102 which receives a clock signal ck from a clock which may be a system master clock. Write-data may be presented at the external I/O pad 101, and clocked into the Master/Slave register 102.

Contrasting FIG. 4 with FIG. 1, it is apparent that latch 420 has been added to the circuit. Latch 420 is coupled between Master/Slave register 102 and data-in lines (di) 103 (which are switched only in write cycles). Latch 420 is shown to be in close proximity to register 102 (as shown in FIG. 4 the line connecting latch 420 to register 102 may be, for example, less than 50 microns in length). Latch 420 is controlled by a write-latching signal ckw which allows the data-in lines (di) 103 to be held constant (latched) during read-cycles.

The write latching signal ckw may be generated by a write latching clock signal generating circuit 428 (or write latching signal generator for short), for example by using a gated pulse generator coupled to the Latch 420. The write latching signal generator 428 may receive the clock signal ck and may be gated by a W (write) control signal using well-known digital techniques for pulsed signals.

Still referring to FIG. 4, the data may then travel down on chip data in (di) lines 103 to a write-driver circuit 104 in a similar manner to that of the circuit of FIG. 1. The write-driver circuit 104 may then transfer the data to the memory core 120 to which it is coupled, responsive to a write pulse signal (wps).

Time tD1 is equivalent to the time for data to travel from I/O pad 101 to register 102. Time tD2 may be equivalent to the time for data to travel down the on chip data in lines (di). Time tW represents the time for the Write operation into the memory core and tR the time for Write recovery of the memory core.

Still referring to FIG. 4, this action of holding data-in lines (di) constant during read cycles may act to conserve power especially whenever the usage of the device is predominantly read-cycles with only a relatively small proportion of write-cycles. This fits very well indeed with the caching applications to which Std. Sync SRAM is a commonly applied, but still has significant benefit in other applications. The benefit of reducing power usage has multiple facets such as improved reliability and in particular improved speed, such as where the by-design maximum clock speed of the device may have been limited by thermal considerations. Thus, some embodiments of the invention enable price/performance improvements to be made to SRAMs.

FIG. 5 shows a block diagram of another part of a Std. Sync. SRAM circuit according to an embodiment of the invention. The circuit of FIG. 5 includes an Input Circuit 599 and a memory core 120. Address Pad 201 may receive addresses (encoded as voltages) from off-chip from where they are clocked into Address Register 208 having passed through conductors coupling the two components.

From Address Register 208 the address may pass through coupling conductors to be latched into Address Latch 510 under control of the ckw signal (which may be active only during write-cycles). Write-Latching signal ckw may be generated by a signal generating circuit (such as circuit 428 of FIG. 4, but not shown in FIG. 5).

Still referring to FIG. 5, MUX (2:1 multiplexer) 514 is connected to memory core 120 to drive the address lines presented to the memory core 120. Responsive to (for example) a Read/Write (R/W) signal, MUX 514 may select a memory address to be presented from either of the source devices to which it is connected. That is, MUX 514 may select a latched address held by Address Latch 510 or an address held by Address Register 208 which may be a D-type flip-flop in some embodiments. Typically, the latched address will be selected during write-cycles, the registered address during read-cycles.

As may be seen in connection with discussion of device timings (below) that this configuration permits write addresses to be held so that a late-write may be made to the memory core 120 even though the device as a whole conforms to early-write or Std. Sync specifications at its external interface (rather than late-write or NoBL™ external specifications for example).

Also shown in FIG. 5 is a Write-Read turnaround bypass circuit for correctly handling (without additional delays) circumstances wherein a “Read” intends to read data from an identical address written just immediately previously. The Write Read turnaround bypass circuit substantially consists of match signal generator 522 and MUX 524. The match signal generator 522 (stylistically represented as an XOR gate) may be coupled to, and accept input from, both Address Latch 510 and Address Register 208. The match signal generator 522 may be cascaded XOR gates or have some functional equivalent so that the “match” signal is asserted whenever all (not merely some) of the significant registered address lines match the corresponding latched address lines. This enables Read-data to be registered for output from the chip (by read-register 528 coupled to MUX 524) having been selected by MUX 524 (responsive to the “match” signal) from the data-in lines (di) to which it is connected. Ordinarily (no “match”) MUX 524 will select the data from its coupled data-out lines (do) i.e. data being read out from memory core.

Time tD1 is equivalent to the time for data to travel from I/O pad 101 to register 102. Time tD2 may be equivalent to the time for data to travel down the on chip data in lines (di). Time tW represents the time for the Write operation into the memory core and tR the time for Write recovery of the memory core.

Regardless of whether the read data is read out of memory core (do) or is write data (di) (e.g., di<0>, di<1>, di<2>) received as a result of operation of the bypass circuit, the read register pipelines output data to the connected Input-Output Pad 101. Although a circuit is shown in FIG. 5 for a pipelined read arrangement, similar circuits for flow-through architectures may readily be created by anyone of ordinary skill in the art.

FIG. 6 shows a timing diagram for a Std. Sync. SRAM circuit according to an embodiment of the invention. It can be seen that Address A0 and corresponding Data D0 are both presented at a time 620 of rising edge of clock xclk. The clock signal xclk may be that of a system master clock. Address A2 and corresponding data D2 are both presented at a time 690 of the rising edge of clock xclk.

After time period tD1, at time 630, the input Address signals and corresponding write-data signals have “traveled” (i.e. sufficient stabilized at) the respective registers 208 and 102, which capture these signals responsive to the rising edge of signal ck.

Next starting at time 640, a latching pulse (of the ckw signal) activates both Address Latch 510 and Write-data Latch 420 to follow the respective signals. Since FIG. 6 is for timings of write-cycles, the Address MUX 514 is selecting output from Address Latch 510 rather than from Address Register 208. After time period tD2, at time 650, the falling edge of ckw leaves the Address and Write-data signal latched in latches 510 and 420, respectively (hence no longer followed). Also, around the same time the write-data has had sufficient time to travel (perhaps up to about 6000 microns or 6 mm in an embodiment) down the on chip data-in lines (di) and has become stable at the memory core. Since ckw is, for the time being, low, the signals on (di) and (ai) remain stable and do not change significantly even though, at time 660, a second write-cycle begins (for A1/D1). It will be apparent that the write latching signal ckw is active during a latching period that starts after completion of an immediately prior write cycle and ends prior to writing the write data to the memory core. Moreover, the latching pulse may need to be timed so as to receive the appropriate Data/Address from the respective registers at valid times.

Functional equivalents, such as a negative logic ckw signal will, of course, be apparent to those of ordinary skill.

Still referring to FIG. 6, at time 670, a write pulse wps signal is active for period tW that writes data D0 into address A0 and meanwhile the next set of addresses A1 and write-data D1 are being processed for the next write-cycle. As can be seen, the write pulse wps for a prior cycle ends (at time 680) before the latching signal ckw is asserted (at time 640) and therefore at least two cycles may safely overlap without detrimental mutual interference. After writing to the memory core a Write recovery period tR (ending at time 690) may be allowed to elapse, and the Write recovery period may even be allowed to encroach slightly into the next write-cycle (which writes data D1 into address A1).

Thus, Write-Data is received in one clock cycle and late-written to the memory core in a subsequent clock cycle. Hence, in the example timings of FIG. 6, a write cycle lasts only one clock cycle. Data which is received in one write cycle is written to the memory core in the next (subsequent) write cycle. The overall effect is that embodiments of the invention may conform to Std. Sync (or early-write) timings at an external interface but nonetheless perform late-writes to the memory core. It may be seen that time periods tR and tD2 may overlap and also tW and tD1 may overlap provided a required sequence of events is followed. The net effect is that the maximum operating frequency of the device is the inverse of tCYCW(min)=tW+tR This maximum operating frequency is a good improvement in speed over that of the previously developed solutions.

Thus, not only may embodiments of the invention provide improved price/performance by better energy management, they also may provide improved price/performance by improved parallel processing.

FIG. 7 shows a timing diagram that contrasts timings for a conventional Std. Sync. SRAM circuit (labeled “OLD”) with a Std. Sync. SRAM circuit according to an embodiment of the invention (labeled “NEW”) in which write-data D0 is written into address A0, write-data D1 is written into address A1 and write-data D2 is written into address A2. Master clock xclk and Address and Data lines are shown to be the same for both OLD and NEW exemplary timings. The other signals—ck, di, wps and ckw have the same significance in this figure as described above for the other figures.

Still referring to FIG. 7 the minimum write cycle times (inverse maximum operating frequencies) tCYC(min) for OLD and NEW exemplary timings are labeled tCYC(OLD) and tCYC(NEW) respectively. The speed advantage of embodiments of the invention are readily apparent in that tCYC(OLD) is shown in FIG. 7 to be a longer time interval than tCYC. Hence, in the example timings of FIG. 7, the “OLD” implementation is running at or near maximum speed and but the “NEW” could successfully be run significantly faster.

Thus, in one embodiment, a synchronous memory is configured to receive an address and corresponding write-data in the same clock cycle of a clock signal in response to the clock signal, where the synchronous memory includes a memory core and an input circuit coupled to the memory core, where the input circuit is configured to receive the write data in a current write cycle and to complete writing the write data, received in the current cycle, to the memory core in a subsequent write cycle, and where the current cycle and the subsequent cycle are synchronous to the clock signal.

In one embodiment, the input circuit also includes a data register to receive the write data and a data latch to receive the write data from the data register. In one embodiment, the data latch latches data only during write cycles.

In one embodiment, the synchronous memory also includes a multiplexer to select selected read data from an output of the data latch and an output of the memory core and a register to receive the selected read data. In one embodiment, the synchronous memory also includes an address register to receive the address and to generate a registered address, an address latch to receive the registered address and to generate a latched address, and a multiplexer to select, from the registered address and the latched address, an address for accessing the memory core.

In one embodiment, the synchronous memory also includes a clock signal generating circuit to generate a write latching signal that is asserted throughout a latching period that starts after completion of a memory core writing phase of an immediately prior write cycle, where a plurality of memory core data-in signals are generated by the data latch only during write cycles and are responsive to the write latching signal.

In one embodiment, the address latch and the data latch perform latching during a latching period that starts after completion of an immediately prior write cycle and ends prior to writing the write data to the memory core and, further where the latching period is timed for receiving the registered address and the write data at a respective valid time.

In one embodiment, the synchronous memory is a static random access memory(SRAM), where the static random access memory has a pipelined output. In one embodiment, the static random access memory has flow-through output.

In one embodiment, a method for single late-write for standard synchronous SRAMS includes: 1) receiving an address and corresponding write data in a current write cycle of a synchronous memory device configured to receive the address and the corresponding write data responsive to a clock signal and in a same clock cycle of the clock signal; and 2) completing writing the write data, received in the current cycle, to a memory core of the synchronous memory device in a subsequent write cycle, where the current cycle and the subsequent cycle are synchronous to the clock signal.

In one embodiment, the method also includes: 1) generating a write latching signal that is asserted throughout a latching period that starts after completion of a memory core writing phase of an immediately prior write cycle; and 2) selecting between a latched copy of the address and an unlatched copy of the address responsive to a Read/Write signal; 3) latching a plurality of memory core data-in signals only during write cycles and responsive to the write latching signal.

In one embodiment, the method further includes pipelining read data output from the memory core and/or flowing-through read data output from the memory core.

One embodiment of a standard synchronous SRAM configured for a single late-write includes: 1) a memory core; 2) means for receiving write data at an external interface according to static random access memory standard synchronous timings for early-write, the means also for late-writing the write data to the memory core, the late-writing being relative to the standard synchronous timings for early-write, where the memory core includes a plurality of data-in signal conductors to receive the write data and the means is also for changing voltages on the data-in signal conductors during write cycles exclusively; and 3) a Write-Read turnaround bypass circuit.

In the foregoing specification, embodiments of the invention have been described with reference to specific exemplary embodiments thereof. It will be evident that various modifications may be made thereto without departing from the broader spirit and scope of the invention as set forth in the following claims. The specification and drawings are, accordingly, to be regarded in an illustrative sense rather than a restrictive sense. 

1. An apparatus, comprising: a synchronous memory configured to receive an address and corresponding write data in a same clock cycle of a clock signal, and responsive to said clock signal, wherein the synchronous memory comprises: a memory core; and an input circuit coupled to the memory core, the input circuit to receive the write data in a current write cycle and complete writing the write data, received in the current cycle, to the memory core in a subsequent write cycle, wherein the current write cycle and the subsequent write cycle are synchronous to the clock signal, wherein the input circuit comprises: a data register to receive the write data; and a data latch to receive the write data from the data register; a multiplexer to select selected read data, from an output of the data latch and an output of the memory core; and a register to receive the selected read data.
 2. The apparatus of claim 1 wherein: the data latch latches data only during write cycles.
 3. The apparatus of claim 1 wherein the input circuit further comprises: an address register to receive said address and generate a registered address; an address latch to receive the registered address and generate a latched address and a multiplexer to select, from the registered address and the latched address, an address for accessing the memory core.
 4. The apparatus of claim 3 further comprising a clock signal generating circuit to generate a write latching signal that is asserted throughout a latching period that starts after completion of a memory core writing phase of an immediately prior write cycle.
 5. The apparatus of claim 4 wherein a plurality of memory core data-in signals are generated by the data latch only during write cycles and responsive to the write latching signal.
 6. The apparatus of claim 3 wherein the address latch and the data latch perform latching during a latching period that starts after completion of an immediately prior write cycle and ends prior to writing the write data to the memory core and further wherein the latching period is timed for receiving the registered address and the write data at a respective valid time.
 7. The apparatus of claim 1 wherein the synchronous memory is static random access memory (SRAM).
 8. The apparatus of claim 7 wherein the static random access memory has pipelined output.
 9. The apparatus of claim 7 wherein the static random access memory has flow-through output.
 10. A method comprising: receiving an address and corresponding write data in a current write cycle of a synchronous memory device configured to receive said address and said corresponding write data responsive to a clock signal and in a same clock cycle of said clock signal; generating a write latching signal that is asserted throughout a latching period that starts after completion of a memory core writing phase of an immediately prior write cycle; and completing writing the write data, received in the current write cycle, to a memory core of the synchronous memory device in a subsequent write cycle, wherein the current write cycle and the subsequent write cycle are synchronous to the clock signal.
 11. The method of claim 10 further comprising selecting between a latched copy of said address and an unlatched copy of said address responsive to a Read/Write signal.
 12. The method of claim 10 further comprising latching a plurality of memory core data-in signals only during write cycles and responsive to the write latching signal.
 13. The method of claim 10 wherein the synchronous memory is static random access memory.
 14. The method of claim 13 further comprising pipelining read data output from the memory core.
 15. The method of claim 13 further comprising flowing-through read data output from the memory core. 